A Hybrid Approach for Web Document Clustering Using K-means and Artificial Bee Colony Algorithm

نویسندگان

  • M. M. Gowthul Alam
  • S. Baulkani
چکیده

Nowadays data growth is directly proportional to time and it is a major challenge to store the data in an organised fashion. Document clustering is the solution for organising relevant documents together. In this paper, a web clustering algorithm namely WDC-KABC is proposed to cluster the web documents effectively. The proposed algorithm uses the features of both K-means and Artificial Bee Colony (ABC) clustering algorithm. In this paper, ABC algorithm is employed as the global search optimizer and K-means is used for refining the solutions. Thus, the quality of the cluster is improved. The performance of WDC-KABC is analysed with four different datasets (webkb, wap, rec0 and 7sectors). The proposed algorithm is compared with existing algorithms such as K-means, Particle Swarm Optimization, Hybrid of Particle Swarm Optimization and K-means and Ant Colony Optimization. The experimental results of WDC-KABC are satisfactory, in terms of precision, recall, f-measure, accuracy and error rate.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hybrid Data Clustering Algorithm Using Modified Krill Herd Algorithm and K-MEANS

Data clustering is the process of partitioning a set of data objects into meaning clusters or groups. Due to the vast usage of clustering algorithms in many fields, a lot of research is still going on to find the best and efficient clustering algorithm. K-means is simple and easy to implement, but it suffers from initialization of cluster center and hence trapped in local optimum. In this paper...

متن کامل

An Improved K-Means with Artificial Bee Colony Algorithm for Clustering Crimes

Crime detection is one of the major issues in the field of criminology. In fact, criminology includes knowing the details of a crime and its intangible relations with the offender. In spite of the enormous amount of data on offenses and offenders, and the complex and intangible semantic relationships between this information, criminology has become one of the most important areas in the field o...

متن کامل

Text Clustering Quality Improvement using a hybrid Social spider optimization

Text document clustering is one of the most widely studied data mining problems. It organizes text documents into groups such that each group has similar text documents. While grouping text documents, several issues have been observed. Accuracy and Efficiency are the main issues in text document clustering. Recently, as clustering problem can be mapped to optimization problem, evolutionary opti...

متن کامل

Tabu-KM: A Hybrid Clustering Algorithm Based on Tabu Search Approach

  The clustering problem under the criterion of minimum sum of squares is a non-convex and non-linear program, which possesses many locally optimal values, resulting that its solution often falls into these trap and therefore cannot converge to global optima solution. In this paper, an efficient hybrid optimization algorithm is developed for solving this problem, called Tabu-KM. It gathers the ...

متن کامل

BeeID: intrusion detection in AODV-based MANETs using artificial Bee colony and negative selection algorithms

Mobile ad hoc networks (MANETs) are multi-hop wireless networks of mobile nodes constructed dynamically without the use of any fixed network infrastructure. Due to inherent characteristics of these networks, malicious nodes can easily disrupt the routing process. A traditional approach to detect such malicious network activities is to build a profile of the normal network traffic, and then iden...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016